Detecting (Un)Important Content for Single-Document News Summarization
نویسندگان
چکیده
We present a robust approach for detecting intrinsic sentence importance in news, by training on two corpora of documentsummary pairs. When used for singledocument summarization, our approach, combined with the “beginning of document” heuristic, outperforms a state-ofthe-art summarizer and the beginning-ofarticle baseline in both automatic and manual evaluations. These results represent an important advance because in the absence of cross-document repetition, single document summarizers for news have not been able to consistently outperform the strong beginning-of-article baseline.
منابع مشابه
A survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملCombining Syntax and Semantics for Automatic Extractive Single-Document Summarization
The goal of automated summarization is to tackle the “information overload” problem by extracting and perhaps compressing the most important content of a document. Due to the difficulty that singledocument summarization has in beating a standard baseline, especially for news articles, most efforts are currently focused on multi-document summarization. The goal of this study is to reconsider the...
متن کاملAutomatic Multi Document Summarization Approaches
Problem statement: Text summarization can be of different nature ranging from indicative summary that identifies the topics of the document to informative summary which is meant to represent the concise description of the original document, providing an idea of what the whole content of document is all about. Approach: Single document summary seems to capture both the information well but it ha...
متن کاملTAP-DLND 1.0 : A Corpus for Document Level Novelty Detection
Detecting novelty of an entire document is an Artificial Intelligence (AI) frontier problem that has widespread NLP applications, such as extractive document summarization, tracking development of news events, predicting impact of scholarly articles, etc. Important though the problem is, we are unaware of any benchmark document level data that correctly addresses the evaluation of automatic nov...
متن کاملGeneric technologies for single- and multi-document summarization
The technologies for singleand multi-document summarization that are described and evaluated in this article can be used on heterogeneous texts for different summarization tasks. They refer to the extraction of important sentences from the documents, compressing the sentences to their essential or relevant content, and detecting redundant content across sentences. The technologies are tested at...
متن کامل